Re: [-empyre-] Archives, metadata and searching
Dear Simon and Stephen (and everybody else of course):
I would like to explain briefly the approach of InterPARES to metadata.
In the past couple of years, InterPARES has been building a database
registering existing metadata schemata and analyzing them according to
specific criteria aiming to establish whether a metadata schema is able to
provide evidence that a record is as accurate, reliable and authentic as it
was when first saved. Thus, our analyses are not concerned with the
retrievability of records, other than indirectly, in the sense that, if a
schema is able to satisfy our requirements, it will also be a powerful
retrieval instrument.
Having looked at several schemata and at several case studies of different
types of records creators (in the arts, sciences, and e-gov.) who are
either using existing schemata or generating personalized ones, we have
arrived at the conclusion that every schema is adequate to any purpose if
it allows to identify the record in context and to establish its integrity.
In other words, every digital entity should have identity metadata and
integrity metadata. The former are the attributes that uniquely identify a
record and distinguish it from any other record. For a letter, they would
be attributes like names of creator (the person in whose archives the
letter is maintained), author (human or organizational person issuing the
letter), addressee, writer (the person articulating the discourse), date on
the doc. date of transmission, of receipt and archiving, subject matter,
filing code, filing codes of previous and subsequent letter, format,
attachments, etc. For a telescope observation record, they would
be attributes like name of star, inclination of telescope, time of
observation, light curve, etc. Every creator should identify what is needed
for identification (and therefore retrieval) of its own records. Integrity
metadata are data about responsibility for the record and for its changes
over time. They include things like name of the person responsible for
handling or for keeping the record, changes made to the record, dates and
results of updates, upgrades, migrations, etc. The purpose is to
demonstrate control on the maintenance process and justify changes. The
reason is that, years later, one wants to be able to demonstrate that the
entity copyrighted or linked to somebody's intellectual rights 10 years
before is the same entity, even if it looks a bit different.
Now, all the metadata indicated above are the responsibility of the creator
and chosen by the creator. Once the digital record goes to the preserver,
it goes as part of an aggregation of material. The preserver should use
metadata schemata representing the identifying attributes and the integrity
information of the aggregation, not of its individual components. Linked to
the metadata for the aggregation should be all the documentation related to
that unit (name of creator, type and scope of material, historical
development, how the material was originally used, circumstances of
acquisition, internal relationships among its parts, technological
characteristics, other related material, how the preserver has upgraded the
material to maintain it accessible, consequent changes, etc.....we call
this archival description.), and directions on how to retrieve things once
one is inside the aggregation. Once the aggregation is retrieved by the
user on the basis of the preserver metadata, than the original metadata
schemata of the creator are used to get the specific record.
To make a long story short, we do not believe in one size fits all. We
believe that metadata schemata should be built according to the same
principles, but should be different from creator to creator unless the
creators are doing the same things and producing the same records (which is
usually true only in government and some types of businesses). We also
believe that preservers should not be attaching metadata to records, but to
the entire entity that they acquire as a unit, and should not be telling
records creators what metadata to use, other than advising them in general
on the principles that should guide their choice.
With all the above said, on Feb. 20 all InterPARES archival theorists will
get together to sort out the metadata concept on the basis of findings to
date, so everything may change...but not by much...I do not think.
Cheers,
Luciana
This archive was generated by a fusion of
Pipermail 0.09 (Mailman edition) and
MHonArc 2.6.8.